Research in Astron. Astrophys. 2012 Vol. X No. XX, 000-000 
http://www. raa-journal. org http://www. iop. org/journals/raa 



Research in 
Astronomy and 
Astrophysics 



A Comparison of Approaches in Fitting Continuum SEDs 

Yao Liu 1,2 ' 3 , David Madlener 3 , Sebastian Wolf 3 and Hongchi Wang 1 

1 Purple Mountain Observatory, & Key Laboratory for Radio Astronomy, Chinese Academy of 
Sciences, 2 West Beijing Road, Nanjing 210008, China; yliu@pmo.ac.cn 

2 Graduate School of the Chinese Academy of Sciences, Beijing 100080, China 

3 Institut fiir Theoretische Physik und Astrophysik, Universitat zu Kiel, Leibnizstr. 15, 241 18 Kiel, 
Germany 



Received [year] [month] [day]; accepted [year] [month] [day] 



Abstract We present a detailed comparison of two approaches, the use of a pre-calculated 
database and simulated annealing (S A), for fitting the continuum spectral energy distribu- 
tion (SED) of astrophysical objects whose appearance is dominated by surrounding dust. 
While pre-calculated databases are commonly used to model SED data, only few studies 
to date employed SA due to its unclear accuracy and convergence time for this specific 
problem. From a methodological point of view, different approaches lead to different fit- 
ting quality, demand on computational resources and calculation time. We compare the 
fitting quality and computational costs of these two approaches for the task of SED fitting 
to provide a guide to the practitioner to find a compromise between desired accuracy and 
available resources. To reduce uncertainties inherent to real datasets, we introduce a refer- 
ence model resembling a typical circumstellar system with 10 free parameters. We derive 
the SED of the reference model with our code MC3D at 78 logarithmically distributed 
wavelengths in the range [0.3 /im, 1.3 mm] and use this setup to simulate SEDs for the 
database and SA. Our result shows directly the applicability of SA in the field of SED 
modeling, since the algorithm regularly finds better solutions to the optimization problem 
than a pre-calculated database. As both methods have advantages and shortcomings, a 
hybrid approach is preferable. While the database provides an approximate fit and overall 
probability distributions for all parameters deduced using Bayesian analysis, SA can be 
used to improve upon the results returned by the model grid. 

Key words: methods: numerical - radiative transfer - protoplanetary disks 
1 INTRODUCTION 

The continuum spectral energy distribution (SED) is an important observable of astrophysical sources 
embedded in a dusty environment such as young stellar objects (YSO), active galactic nuclei (AGN), 
and post-AGB stars. It allows one to probe the mass, composition, temperature, and spatial distribution 
of the dust. The common method of analysis is the comparison of available observations with pre- 
dictions derived from solving the radiative transfer self-consistently with a model describing the dust 
properties and its spatial distribution. The task is to find a parameter set that reproduces the observations 
best for a given model. This minimization of the discrepancy between observation and prediction is an 

optimization problem and normally called a fitting procedure. 

Various fitting algorithms have been proposed and implemented before (e.g.. iPress et al.|[l992l) . 
From a methodological point of view, the fitting approaches differ in the quality of the resulting fit and 
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demand on computational resources. The most common method is based on a pre-c alculated model 
database that is establishe d on a huge grid in a high-dimensional parameter space (e.g jRobitaille et all 
l2006UWoitke et al.ll2010h . Once the database is established, the optimum parameter set is readily iden- 
tified by evaluating the merit function, i.e. for our purpose the \ 2 -distribution. However, as the number 
of grid points increases substantially with the dimensionality of the parameter space, the model grid 
is always a compromise between finite computational resources and resolution. Simulated Annealing 
(SA) is a versatile optimization technique based on the Metropolis-Hastings al gorithm that can be used 
to search for the opt imum of a merit function in arbitrary dimensions (e.g.. iKirkpatrick et al J 1 1 9 83t 
lMadleneretaTll2012l) . The main idea is to construct a random walk through parameter space thereby 
improving the agreement between observation and prediction gradually by following the local topology 
of the merit function. A drawback of this method is that no upper bound for the step count to reach the 
global optimum can be given. Moreover, the sequential execution of the algorithm and possible slow 
convergence of the Markov chain can make SA time consuming. 

In the context of SED modeling, a pre-calculated database is quite commonly invoked to perform the 
task because it can not only provide an approximate fit b ut also enables to evaluate the overall uncertainty 
of all parameters by Bayesian analysis (lLav et al.ll 1997b . On the contrary, only a small sample of studies 
to date make use of SA in this area due to its unclear accuracy and typical convergence time. A main 
limitation is the significant computing time per individual SED model, especially when simulating the 
SEDs in an optically thick system. With the recent advance in computing performance, the SA approach 
is now applicable. The motivation behind this study is to evaluate advantages and shortcomings of these 
two methods when applied to an astrophysical optimization problem. We set up an unbiased benchmark 
by deriving synthetic observations from a known reference model to exclude any influence due to model 
uncertainties on the optimization process. We will present fitting quality and computational cost for 
this idealized optimization task using both approaches to provide a guide for practitioners to find a 
compromise between desired accuracy and available resources. 

2 REFERENCE MODEL AND MODELING 

Circumstellar disks surrounding YSOs, are considered as an essential step of the star-forming process. 
A lot of attention have been paid to these interesting objects, since they are most probably the birthplace 
of planet systems (e.g. jMundv et al.l2000tlMever et al.l2007l) . The planet formation mechanism in these 
disks and the properties of the resulting planetary system depend on the structure of the protoplanetary 
disks that can be constrained by SED modeling. In this section we introduce a reference model (RM) to 
mimic a virtual object located in the Taurus star formation region at a distance of 140 pc and describe 
our simulation technique. By using a virtual object, all uncertainties in regard to the model itself are 
eliminated. 



2.1 Disk Structure 



We employ a parametrized flared disk in which dust and gas are well mixed and homogeneous 
throughout the system. This model has been successfully used to expla in multi-wavelength observations 
of protoplanetary disks li ke the SED and high resolution images (e.g JWolf et al.ll2~003l:ISchegerer et al.l 
120081: ISauter et al.ll2009l) . For the dust in the circumstellar disk we assume a density structure with a 
Gaussian vertical profile: 



Pdust ~R a exp 
and a power-law distribution for the surface density 

S(i?) - R 



z 

2h? 



(1) 



(2) 



where R is the distance from the central star measured in the disk midplane. The proportionality factor 
is determined by normalising the total dust mass in the disk. The disk scale height h(R) follows the 
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power law 

W^-(icMu)"' (3) 

with the flaring exponent j3 describing the extent of flaring and the scale height hioo at a distance of 
100 AU from the central star. 

Table [T]lists the parameters of the RM. We trun cate the disk at 300 AU, a typ ical size found for T 
Tauri disks and fix the value of ftxoo 

to 10 AU (e.g.. lAndrews & Williamsll2007bl) . We consider a total 
dust mass of 5 ■ 10~ 5 M B , corresponding to the typical va lue found in T Tauri disks (e.g jBeckwifh et all 
1 19901 lAndrews & Williamsll2007at lAndrews et aU2009l) . We make a standard assumption for the dust- 
to-gas mass ratio, i.e., m,i us t / m gas = 1/100. For the flaring exponent, we adopt the value of 1.25 (e.g., 
iD'Alessio et al.|[l999l) . T he exponent of the dust de nsity profile a = 3(/3 — |) = 2.25 is derived from 
viscous accretion theory dShakura & Sunvaevll973l) . 

Table 1 Parameters of the reference model (RM) and the best-fit in our pre-calculated 
database. 



parameter RM best-fit model 
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2.2 Stellar Heating 

There are several heating sources of the circumstellar disks, such as irradiation by the central star, 
disk accretion and turbulent processes within the disks. To keep our model simple and decrease the 
number of fr ee parameters, we conside r a passive disk in which only stellar irradiation is taken into 
account (e.g. JChiang & Goldreichl 19971) . We assume parameters of a typical T Tauri star for the central 
source: it!* = 2 and = 4000 K, corresponding to a bolometric luminosity of ^0.92 L Q (e.g., 
iGullbring et allll998l) . 

2.3 Dust Properties 

We consider the dust grains to be homogeneous spheres, since the assumption of spherical grain is 
a valid approximation to describe the scattering behavior compared with a more complex and fractal 
grain structure. The dust grain ensemble incorporates both silicate and graphite material with relative 
abundances of 62.5% astronomical silicate and 37.5% graphite. To calculate the optical properties of the 
dust with the Mie scattering-theor y, we use the complex refracti ve indices of "smoothed astronomical 
silicate" and graphite p ublished by Weingartner & Draind d200ll) . For graphite, we adopt the common 



| approximation dDraine & Malhotrall 1993b " which means the extinction efficiency factor is com- 



:1 

3 ' 3 

puted by: 

1 2 

Qext, graphite = gQcxt(e||) + gQcxt(e±), (4) 

where eii and ej_ are the components of the graphite dielectric tensor for the electric field parallel and 
orthogonal to the crystallographic axis, respectively. 
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We assume a power law grain size distribution n(a) oc a -3 5 with a mnl < a < a max , where a 
represents the grain radius and a m [ n and a max are the minimum and the maximum grain radii. The size 
distri bution with amin = 5 nm and a max = 0.25 p,m is the well-known MRN distribution found for the 
ISM dMathis et al.lll977l) . For the RM, we keep a m ; n = 5 nm and incre ase the maximum grain size to 
q may = 2.5 /zm to account for dust growth in circumstellar disks (e.g.. ISauter et alj|2009l : iRicci et al.l 
2010). 



2.4 Radiative Transfer Simulation Code 



To deriv e the observ ables of the RM, we use the well-tested radiative transfer code MC3D devel- 
oped bv lWolj (120031) . Based on the Monte-Carlo method, MC3D solves the radiative transfer prob- 
le m self-consistently. I t imp lements the immediate temperature correction technique as describe d 
by iBiorkman & Woodl d200ll) and the continuous absorption concept as introduced by iLucvl d 19991) . 
Multiple and anisotropic scattering is considered in the simulations. 




Fig. 1 Left plot: the top two model fits in the database. The best-fit model is indicated as a 
blue solid line and the red squares represent the SED of the RM. Right plot: the moduli of 
flux discrepancies between the models and the RM. 



2.5 The SED of the RM 

The radiative transfer problem is solved at 100 wavelengths, logarithmically distributed in the wave- 
length range [50 nm, 2.0 mm]. The red squares in Fig. 1 show the simulated SED of the RM at 78 
wavelengths in the range [0.34 /im, 1.3 mm]. Since data points within this range can be obtained by 
current telescopes, we therefore only reproduce fluxes at these wavelengths in the fitting procedure. 

3 SED-FITTING WITH A PRE-CALCULATED DATABASE 

A popular approach to analyze SEDs is to solve the radiation transfer equation based on dust properties 
and a density distribution. Given a particular model and radiativ e transfer code, a d atabase of model 
SEDs can be established on a part of the model's parameter space dWoitke et al.ll2O10l) . 

The main advantage of this approach is that it allows to explore how specific data points influence 
parameters of the fit. As an example, given a particular dust model one (sub)millimeter data point can 
constrain the total dust mass. An additional data point can help to constrain the maximum dust radius 
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in protoplanetary disks. Moreover, observations of a large sample of objects can be fitted very fast to a 
pre-calculated database. 

In order to examine the fitting accuracy of a pre-calculated database, we adapted the MC3D code to 
the GPU cluster at Purple Mountain Observatory and built a database of 70,000 model SEDs using the 
same dust densi ty profile and grain com position as described in Section 2. We used a similar strategy as 
implemented by lRobitaille et al.1 d2006h to sample the parameter values in wide space to avoid giving any 
attention to a particular model, especially the RM. Generally, the model SEDs span a reasonable range 
of parameters constrained by theory and observations of protoplanetary disks. The effective temperature 
T* and r adius of the centra l star were derived by interpolating pre-main-sequence evolutionary tracks 
given by ISiess et all d2000h with the mass and age randomly sampled and logarithmically spaced in 
the ranges of [0.5 M Q , 1.5 M Q ] and [0.1 Myr, 20 Myr] respectively. The dust mass was sampled from 
a wide range of [10~ 9 , 10~ 3 ]M Q , but values between 1O~ 7 M and 1O _4 M were randomly selected 
more frequently. The model grid contains a majority of samples with disk outer radius between 100 AU 
and 400 AU. Other reasonable values consistent with observations are also considered. For one half of 
our models, we set the disk inner radius to the dust sublimation rad ius R su h = RJT^ u /Tl^)" 2,085 , 
where we take T su b = 1500 K to be the dust sublimation temperature (I Whitney et al.ll2004l) . The flaring 
exponent for these models is set to a smaller value than that found for T Tauri disks, i.e., /? < 1.25. 
This was done in order to interpret the observations of homologously d epleted disks in which m aterial 
dissipation is thought to occur throughout the disk simultaneously (e.g. JCurrie & K envon 2009). In the 
remaining half of models, the inner radius of the disk was increased to account for canonical transition 
disks that are expected to have large inner holes due to the mechanism of inside-out disk dissipation (e.g., 
IMuzerolie et al.ll2.010t) . The flaring exponent (3 for these models are uniformly sampled between 1.10 
and 1.30. The exponent of the density distribution a is calculated using the relationship a = 3(/3 — |) 
derived for a steady-state disk in which the temperature and surface density profile are coupled by 
p + q = 3/2, where q is the exponent of the temperature profile T^r~ q . The models with scale height 
parameter ft-ioo between 8 AU and 15 AU occupy a large portion, and other reasonable values outside 
this range are also contained in our database. The entire model grid was established in approximately 
30 days using 32 CPUs on the computer cluster. 

The RM and the grid in parameter space of the database are set up independently. We fit the SED 
of the RM to this database and examine whether the database returns a model with parameter values 
close to the reference. Additionally, the distance is fixed at 140 pc and no extinction is used to avoid any 
uncertainties from the employed extinction law. In short, we are only interested in the fitting quality of 
this approach. Fig.[T]shows the top two SED models in the database and the flux discrepancies between 
these models and the RM as well. The blue solid line represents the best-fit with a minimum \ 2 = 179.5 
calculated from 

2 _^ [F i -F i (BM)]' 2 
x 2^ AF 2 (RM) ' ( ' 

i—l 1 v ' 

Here, n is the number of data points, Fi is the simulated flux, F^(RM) is the "observed flux". The 
photometric uncertainty, AFi (RM), was assumed to be comparable to typical observation errors of real 
instruments working in different wavelength regimes, 10% and 15% of the flux at shorter and longer 
wavelengths respectively. This assumption has little impact on our study as along as the observation 
errors are not changed. Obviously, the best-fit SED is in good agreement with the SED of the RM. The 
corresponding parameter values of the best model are listed in TableQ] It is particularly noteworthy that 
the fitting results constrain the inner radius and dust mass quite well. The central star of the best-fit 
model is much brighter than the RM. However, the fluxes at short wavelengths are close to the RM. This 
can be explained by higher disk inclinations (e.g., the best fit i = 75°) for which the outer disk occults 
the centra l star and even the inner di sk regions, leading to a significant extinction for short-wavelength 
emission dChiang & Goldreichlll999h . The slight flux deficit in the far-infrared domain (^100 /im) can 
be explained by a smaller flaring exponent (f3 = 1.033) of the best-fit model as compared to the RM. 
Disks with larger flaring exponent intercept a larger portion of the central star's radiation due to a larger 
flaring angle and consequently re-emit more infrared flux. The maximum dust grain radius a max = 
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25 fim is one order of magnitude larger than the reference value. This shallows the spectral index of 
the best-fit model at (sub)millimeter wavelengths and weakens the 10 /im silicate band. However, the 
database contains only four values for a max , namely 0.25 /im, 2.5 fim, 25 fim, and 250 fim, hence the 
difference in <z max between the best-fit model and the RM is only one grid interval. 
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Fig. 2 Bayesian probability distribution of selected disk parameters. The triangles mark the 
probability bins containing the best-fit parameters. The vertical dashed lines symbolize the 
reference values. 



In order to estimate the confidence range for each parameter, we performed a Bayesian analysis 
using the reduced \ 2 defined by 

Xred = Tr 7, (6) 

JVdata - U - 1 

where A^ata = 78 and n = 10 are the num ber of data points and degrees of freedom respectively (e.g., 
iPinte et alj|2007l 120081: iBouv et alj|2008l) . The relative probability exp [~x 2 c d/ 2 ] of each parameter 



set is used as a statistical weight to calculate the relative likelihood for every parameter by summing 
over all models with a common value and normalising with the total sum. Non-uniform distributions of 
models in parameter space due to different sampling density are considered as the priori probabilities 
and have to be taken into account. Quantitative error bars for every parameter can then be deduced from 
the resulting probability distributions. 

Fig-Elpresents the probability distribution for some selected disk parameters. The results show that 
SED analysis places constraints on the parameters differently, e.g. only high inclinations (edge-on) can 
be excluded, while all other configurations appear to be equally likely. The scale height hioo and the 
flaring exponent /3, characterising the vertical geometry of the disk and hence its capacity to absorb 
stellar energy, are constrained to some extent. Our Bayesian analysis clearly supports the notion that 
SEDs mainly contain information on the inner radius and dust mass of circumstellar disks as we get 
the best constraints on these two parameters. Moreover, we noticed that the best-fit model implies a 
parameter value (e.g., j3) offset from the peak of its probability distribution. This is an indicator of 
model degeneracy often encountered in pure SED fitting. 



4 OPTIMIZATION WITH SIMULATED ANNEALING 



Simulated annealing (SA) is a variation of the Metropolis-Hastings algorithm that applies methods of 
statistical physics to optimization p roblems like the traveling salesman problem and has been use d for 
many optimization challenges (e.g jMetropolis et al.ll 1 95 31 : lHastingsll 1 9701 : [Kirkpatrick et al.fl983l) . The 
physical analogy of quenching a hot melt has given this extremely useful method its name as the main 
feature of SA is the gradual reduction in temperature during optimization. Basically, the distribution 
under scrutiny can be interpreted as the inner energy of a system in a thermal bath. By starting at a 
high temperature and subsequently cooling the bath while allowing the system to evolve through phase 
space the system trajectory intrinsically leads to regions of lower inner energy and thus to the global 
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optimum of the underlying problem. One of the major advantages of this Monte Carlo method is the 
independence from the dimensionality. Local extrema are overcome intrinsically, no gradients have to 
be evaluated and the parameter space can be discrete or continuous. We adapted this technique to search 
the optimal disk model for the synthetic SED derived from the RM. 

Table 2 Initial Step Widths and Boundary of the Parameter Space 



parameter 


fa 


min 


max 


T*[K] 


0.2T 40 


2500 


7000 




0.2 L* 


0.2 


8.0 


#in[AU] 


0.2i? in0 


0.1 


20.0 


i?out[AU] 


0.2 -Routo 


50 


1000 


a 


0.005 


1.5 


3.0 


P 


0.005 


1.0 


1.3 


ftlOOAU [AU] 


0.25 


5.0 


20.0 


m dust [M Q ] 
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10~ 9 


10" 3 


a m ax[/i m ] 


0.2 a ma xO 


0.25 


1000 


in 


2.0 


15 


85 



Notes: T*o, L*o, -RinO, Routo, M dust0 and a max o are the starting points. 



Table 3 Starting Points of the Markov Chains 



parameter 


I 
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IV 
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VI 
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2806 




0.93 
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0.32 


0.25 
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0.906 


0.613 
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0.17 


0.24 


0.45 


0.19 


0.14 


-Rout [AU] 


173.8 


139.4 


736.5 


69.0 


83.9 


117.3 


72.2 


61.3 
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2.124 


2.013 
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1.927 
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1.021 
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0.43 


if] 


44 
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22 


27 
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19 



4.1 Algorithm 

SA creates a Markov chain of points in parameter space by starting from an arbitrarily chosen set of 
parameters within the constrained range, see Table [2] The parameter intervals considered here are con- 
sistent with those for establishing the pre-calculated database. The Markov chain progresses from the 
starting point by generating new parameter sets depending on the current position until ideally the global 
optimum is reached. The parameter values of each step are recorded and can be used for subsequent 
analysis. We list 8 starting models in Table|3]with parameter values that deviate from the RM in varying 
degrees. After sampling a uniformly distributed number in the range [0, 1] using the randomu function 
in IDL programming language, we scale it to the corresponding parameter range to obtain the starting 
point in every dimension. The argument seed of the randomu function used to initialize the random 
sequence is chosen to be identical to the chain number. Moreover, the random sampling procedure is 
implemented in logarithmical space for parameters with large dynamical range, such as the inner radius, 
dust mass and maximum grain size. We define the starting points in this way to ensure independence 
from the RM and reproducible results. The applicability of SA to the optimization of SEDs of circum- 
stellar disks can be validated by comparing the disk models generated by the algorithm with the RM. 
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Moreover, we are interested in the number of Markov chains and steps per chain needed to find a good 
fit. The displacement Aa in parameter space is sampled from a gaussian distribution using varying step 

sizes f3 k : 



Prob(Aa) ~ JJ^ cxp 



2# 



(7) 



where the index k £ {1, D} enumerates the parameter dimensions. To ensure randomness of the 
sampling processes in our study, 4 different seeds were used to initialize the pseudo-random number 
sequence for each of the 8 starting points, resulting in a total number of 8 • 4 = 32 different Markov 
chains. 

After sampling a new step Aa, we accept or reject it by evaluating the acceptance probability using 
the difference in \ 2 between the last accepted step and the new position: 



A = min < 1 , exp 



Ax 2 



(8) 



We immediately accept a new step if the \ 2 at the new position is lower than at the actual position. 
If we only accept choices with A = 1, the Markov chain converges to the next local minimum that is 
not necessarily identical with the global optimum. Instead, a uniformly distributed number u £ [0,1] 
is sampled and compared with A. The new proposed position is accepted if u < A and the chain has 
the chance to escape a local minimum depending on the actual temperature. The crucial role of the 
temperature r is discussed in the next section. 



4.2 Annealing Schedule 

The random walk through parameter space not only moves to better models with smaller x 2 but also to 
worse models with a probability ~ cxp[— Ax 2 /t]. The system temperature r is hence the major control 
parameter of the random walk and is gradually reduced to steer the Markov chain to the vicinity of the 
global optimum. It can be show n in the case of logarithm ic cooling that the Markov chain will reach the 
global optimum asymptotically dGeman & Gemanlll984l) . Unfortunately, no upper bound for the time to 
reach the optimum with this st rategy can be given and a var iety of cooling schedules have been proposed 
to achieve faster convergence dNourani & Andre senlfl998l Monotonic cooling, like in exponential and 
linear schedules, is used in most applications of SA. The chain starts at a high temperature, runs through 
a melting phase to reduce any bias from the chosen starting point, and is then successively cooled to 
reduce the probability of moving to worse parameter sets until it freezes at a location in parameter space. 

However, monotonic cooling schedules always contain several free parameters that must be adjusted 
empirically to control the duration of the optimization . For this study we implemented a non-monotonic 
schedule by setting the chain temperature r after each accepted step to 

r = 7-xLf (9) 

Here, 7 ~ 0.25 is a fixed parameter controling the probability of taking an uphill step (e.g. jLocateilil 
l2000h . Without any user interaction during the optimization procedure, our approach enables a chain to 
escape a local minimum. 



4.3 Adaptive Step Sizes: 

The step size j3k controls the displacement between the two adjacent positions in parameter space. 
Optimal step sizes cannot be defined at the beginning due to the lack of any information on the distribu- 
tion under investigation. Fixed step width is not an appropriate solution because this approach can waste 
computation time if the chosen value are too large compared to the local shape of the merit function, 
leading to a low efficiency of the algorithm. If the increments are too small, the Markov chain converges 
slowly and the escape from a local valley becomes unlikely as too many steps are needed. The step 
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sizes for each of the parameters hence have to be adapted along the run. For this purpose, the reasonable 
initial values that are dependent on the starting points of the Markov chain are listed in Table [2] 

We define a local acceptance ratio of the chain using the sequence r/ n <E {0, 1} of rejected and 
accepted steps by calculating 

1 " 

£n = T *hn ( 10 ) 

m— n — / + 1 

for the constant lookback length I < n, where n re fers to the step coun t. In order to maintain the op- 
timal acceptance ratio in the vicinity of £ = 0.234 dRoberts et al.ll 19971) . we implement a control loop 
that adjusts the step widths by analysing the last I = 25 positions of the chain. The algorithm calcu- 
lates the moduli of relative changes between the proposed parameter a n> k and the previously accepted 
parameter a„_ lfc : 



a>n-i,k 

By separately summing up the relative changes c Hl k for every parameter in accepted and rejected pro- 
posals, two vectors g n ^ and b n ,k can be derived to describe the impact of good and bad decisions: 



9n,k = VmC m ,k, b ni k 

■m—n—l+l 



E 



( 1 l]m ) Cm , k 



(12) 



if £n > £o, the step size of the parameter with the smallest component in g n _k is increased to encourage 
riskier proposals. On the contrary, if £„ < £o the step size of the parameter with the largest component 
in b nt k is decreased to induce more conservative proposals. In either case the step size of the selected 
parameter is adjusted by multiplying with the factor of (1 — £o) + £«■ Moreover, for parameters spanning 
several orders of magnitude, like for rridust and a max , we adapt their step widths using a logarithmical 
scale. 

Additionally, the step sizes are controlled to be < 50% and > 1% of the current parameter values 
in order to avoid large fluctuations and extremely slow convergence of the Markov chain respectively. 



4.4 Chain Abortion 



Since no upper bound for the step count to reach the global optimum in S A can be given, we must define 
a reasonable criterion to stop an optimization run. When the system temperature r drops, the Markov 
chain can get trapped at a location in parameter space if surrounded by sufficiently steep gradients as 
any deviation from the local minimum will yield large Ax 2 , see Eq. 8. 

As the system temperature is coupled to the x 2 of the current position, see Eq. 9. Hence, the prob- 
ability to escape a local minimum does not converge monotonically to zero as the probability to climb 
uphill remains constant. Hence, it is not straightfo rward to decide when to abo rt the chain using a tem- 
perature threshold as normally implemented (e.g.,|Nourani &Andresenfl998l) . Instead, we analyze the 
quality of the fit step by step for every accepted model of an individual Markov chain. The variation of 
the fitting quality is evaluated by calculating the modulus of the difference A% 2 between two adjacent 
accepted steps. A particular Markov chain is aborted if the box average (A% 2 ) remained below a varia- 
tion threshold for a certain number of steps A^bort in a row. We used Ax 2 bort = 67 and iV~100 steps 
and averaged with a box of 10 samples, see Fig.|3](a) for examples of chain lib and IVd- The value for 

the threshold is derived from (x 2 ed ) = jv^-i-i = 1' as ^data = 78 and n = 10 correspond to the 
number of data points we fit and the degrees of freedom of our model respectively. Here, we empha- 
size that our criterion of chain abortion, especially the defined threshold Ax 2 bor t' is n °t universal. One 
should choose an appropriate solution depending on the \ 2 -distribution of the problem at hand and the 
required accuracy. 
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250 



accepted step count accepted step count 

Fig. 3 x 2 an d selected model parameters plotted against the number of accepted steps for 
the Markov chains lid and IVd- The horizontal dash lines in the upper and lower panel of 
diagram (a) depict the best-fit \ 2 from the database and the level of (A^ 2 ) = 67 respectively. 
Diagram (b) depicts the inner radius (upper panel) and the dust mass (lower panel) of the 
accepted models. The dashed lines in diagram (b) symbolize the reference values. 



Table 4 Comparison of the best \ 2 obtained with optimization starting with four different 
seeds from six locations given in Table [3] 



chain 


a 


b 


c 


d 


I 


45 


72 


56 


89 


11 


82 


39 


12 


68 


III 


504 


236 


293 


693 


IV 


39 


127 


375 


11 


V 


34 


262 


165 


31 


VI 


85 


52 


127 


64 


vn 


106 


31 


527 


22 


VIII 


351 


95 


71 


166 



Notes: (1) x 2 from SED-Fitting on the basis of database is 179.5. (2) The subscript letters from a to d represent different seeds 
used to initialize the pseudo-random number sequence. 

4.5 Fitting Results 

Thirty-two CPUs were used to perform the optimization and each Markov chain was propagated on a 
single CPU. The typical number of steps for a chain before abortion is ~1, 200. Because the calculation 
time per iteration highly depends on the optical depth of the proposed model, the total run-time of 
an individual chain can vary significantly. On average, it took about 25 days to complete one chain, 
comparable to the time to establish the database. The x 2 , i^X 2 ) an d some model parameters of the 
Markov chains Ilb/IVd are plotted in Fig. [3] versus the accepted step count. It is clear from this plot 
that the merit function decreases quickly during the optimization run and then remains in the vicinity 
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of ^50. Both R m and A/dust gradually converge to the value of the RM. This behavior is the base 
for the criterion described in Sect. 4.4 for chain abortion. Parameters like i? ut that are difficult to be 
constrained by the observations, does not converge in an obvious pattern but instead cover an extended 
interval. 

The minimum \ 2 for the best fit of each Markov chain are listed in Table [4] By comparing these 
values to the minimum x 2 of ~179.5 obtained from fitting the SED to the database we can evaluate the 
quality of fit. Most Markov chains found parameter sets with significantly lower x 2 as compared to the 
result from the database, demonstrating the applicability of SA in the field of circumstellar disk SED 
modeling. 

More quantitatively, 75% of the Markov chains found "improved" models compared to the best fit 
in our pre-computed database. Here, we use quotation marks to caution the reader that better x 2 not 
necessarily translates into disk parameters closer to the RM as the model is degenerate. 

Table 5 The best-fit disk parameters for the RM found by SA and the derived error intervals. 



parameter best-fit 

T* [K] 4010+25 

L* [L ] 0.93tg;jg 

R ia [AU] 1.95+ ' 35 



0.14 

-52 
-28 



Rout [AU] 480; 

2 o 4 +0.091 

a 1 24+0-069 

h 100 [AU] 11.58±?;^ 

m dmrt [1O- 5 M ] 4.57l ;™ 

<W [Mm] 3.5t°;^ 

i [°] 60.9±l 9 2 



4.6 Estimation of Local Error Intervals with SA 

Estimation of the uncertainties for the best-fit parameters using SA are quite different from the pre- 
viously described Bayesian analysis. First, a confidence interval Xconf nas to ^ e defined from already 
calculated models. The deduction of this bound relies on the practitioners experience as no analytic so- 
lution can be given. Secondly, the vicinity of the best-fit in parameter space is probed by starting one or 
several Markov chains from this location. After collecting sufficient parameter sets below the confidence 
threshold, this normally asymmetric domain can be characterized by taking the minimum and maximum 
value of each degree of freedom. 

We started a new Markov chain from the best-fit model of chain IVd to sample the x 2 -distribution. 
The low starting temperature of t~3 (see Eq. 9) retained the random walk to the vicinity for suffi- 
cient steps to examine the local optimum. We gathered ^500 samples with this new chain below the 
confidence threshold Xconf defined by 

Xconf - Xl <3(JV d ata-n-l), (13) 

where x 2 1S the new overall minimum \- 2 found during the restart run dRobitaille et alj|2007l) . Again, 
-Wdata = 78 and n = 10 denotes the number of data points and dimensionality of the model. Table 
summarizes the best-fit model of chain IVd, the error bounds are deduced from the minimal and 
maximal components of all sampled parameter sets. 
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We like to emphasize that the error bounds deduced with this method have to be interpreted as a 
local confidence interval as it is only based on a Markov chain probing the direct environment of the 
best fit while the Bayesian analysis is more appropriate for a global error estimation by taking the whole 
database into account. 

5 DISCUSSION 

There are generally two explanations for discrepancies between SEDs derived from the best-fit model 
and the RM. First, the SED is a spatially integrated observable and high-dimensional disk models are 
often degenerate. Hence, fit ting of such models can only provide weak constraints on the disk structure 
(e.g.. iRobitaille et"ai1l2007l) . Experience from previous modeling campaigns unambigously indicates 
that additional observations, especially multi-wavelength, high re solution images, are crucial to over- 
come model degeneracy (e.g. JPinte et al.ll2008tlSauter et al.ll2009l) . Secondly, it is impractical to search 
on a sufficiently fine grid in parameter space as realistic models usually contain too many dimensions. 
The SED fitting problem is therefore an excellent showcase to assess optimization methods for astro- 
physical modeling. We will discuss in the following paragraphs the highlights of our study on fitting 
with a pre-calculated database and simulated annealing. 
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Fig. 4 An i l lustrat ion of hybrid approach to model SEDs of two transition disks identified by 
ICieza et al.l(l2010l) . in their nomenclature TRAN 14 (left plot) and TRAN 15 (right plot). The 
best-fit model in the pre-calculated database is indicated as a blue dashed line. The red solid 
line represents a highly improved model found by SA based on the results informed by the 
database. Observations are overplotted with black squares. 



Database vs. SA: We implemented two different approaches to fit SED data of the RM, the use of a 
pre-calculated model database and SA, and compared these two methods extensively. 

After construction of the database, a single set of observations can be fitted very fast by calculating 
the x 2 °f every model in the database. The result may be only an approximation but the database enables 
us to deduce global error intervals for all parameters with the Bayesian inference method. It evaluates 
the probability for each parameter by weighing all models in the database. The fitting accuracy highly 
depends on the grid resolution of the models. Unfortunately, the number of grid points increases expo- 
nentially with the dimensionality of the parameter space. It is hence necessary to make a compromise 
between grid resolution and available computational resources. 
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Our results show that in most cases SA finds parameter sets with a \ 2 approximately half the size 
of the solution from the pre-calculated database, indicating a higher fitting quality on average. However, 
some Markov chains are unable to reproduce the SED quite as well during the allocated timeframe 
as their siblings. This is caused by the stochastic nature of Monte Carlo methods and should be taken 
into account by using sufficient Markov chains in parallel to perform the optimization. The increasing 
computational speed of multicore CPUs ameliorates this embarrassing/parallel problem but the user 
will have to find a compromise to retain the calculation time in a tolerable range too. SA is a sequential 
process and the Markov chain may converge slowly. The fitting of SED data for a sample of objects can 
therefore be a very time consuming task, even for a medium computer cluster with hundreds of CPUs. 

Hence, using SA and a pre-calculated model database have their individual merits for solving astro- 
physical fitting problems. SA is often better suited for individual cases whereas a database is preferable 
in a sample study, and the practitioner has to choose by weighing the computational constraints against 
the required accuracy. One can of course use a hybrid approach by taking the best-fit model from the 
database as starting point for SA. Moreover, the most probable value for each parameter indicated by 
the peak of Bayesian probability distribution is also a good starting choice for SA. An illustration of 
this idea is g iven in F ig.|4]for fitting SEDs of two "transition disks" in the Ophiuchus molecular cloud 
identified by ICieza et aL I j201 Oh . in their nomenclature TRAN 14 and I RAN 15. The quota tion mark 



here is used to remind that the selection criterion for transition disks used bv lCieza et al.l (1201 Ol) is broad. 
Six different Markov chains starting from the best-fit, the most probable parameter set in our database 
or their surroundings were used to optimize the fit. After we performed ^150 accepted steps for each 
Markov chain, the quality of the fit for both objects were highly improved. 

Starting point for SA: The influence of the starting point on the probability to encounter the vicinity 
of the global optimum is obvious. In our study, the starting location III features the largest distance in 
parameter space to the RM, see Table [3] Consequently, more steps are required to reach the RM. The 
corresponding chains therefore has more difficulty in finding better solutions than the best fit from the 
database compared with other optimization runs. As some parameters of astrophysical models can be 
constrained by observations, we suggest to choose the starting values accordingly. 



6 SUMMARY 

We have compared in detail the fitting of continuum SED using a pre-calculated database and SA, a 
Markov chain Monte Carlo method that has been successfully employed to solve many optimization 
problems outside the field of astrophysics. All the simulations are performed with the radiative transfer 
code MC3D. Our study shows the practical application of SA in the context of circumstellar disk SED 
modeling and compares fitting quality and efficiency of both approaches. Because the same radiative 
transfer code, disk structure and dust composition are used to calculate the database and the Markov 
chain, the comparison of fitting quality reduces to a comparison of resulting \ 2 - We show that SA typ- 
ically finds better solutions than the database, directly demonstrating the applicability of this algorithm 
to the modeling of SED data. However, a database can quickly evaluate the overall uncertainty of all 
parameters and provide a good starting point for SA. A hybrid approach to combine both methods leads 
to more accurate solutions. 
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